Skip to content

Conversation

@Userwhite
Copy link
Contributor

@Userwhite Userwhite commented Jul 12, 2025

What problem does this PR solve?

Issue Number: close #53139

Problem Summary:

Release note

use stream load async return to optimize the performance under high concurrency, we treat send_reply as callback of exec_plan_framgnet .

    1. because pipeline may invoke callback when prepare failed , so we use is_prepare_success to ensure no dead lock at on_header.
    1. ctx-> _finish_send_reply ensure send_reply only be invoke one time and request is valid. becuase we can't decide if callback will send_reply again when load fail or load timeout.
    1. ctx-> _can_send_reply ensure send_reply only when on_header fail or handle finished, or sender will receive broken-pipe when meet data-quality error

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

曹俊辉 added 3 commits July 12, 2025 12:04
…e better http performance

* fix some bug for partition

* fix for thrift

* fix the thrift exit bug

* Revert "[feat](http): use async reply to provide better http performance

* ensure free order

* [Fix](stream-load) Fix stream load stuck under high concurrency (apache#36772)

When the concurrency of streamload exceeds the number of threads in the
remote scanner, streamload may get stuck. The reason is that the
libevent thread blocks and waits for streamload to complete, and when
there is no intersection between the tasks handled by the scanner thread
and the libevent thread, it gets stuck.
The solution is to convert the synchronous waiting tasks of libevent
into asynchronous execution by using callbacks in the streamload
executor thread.

See merge request: !740"
Revert commit d9e74efa762c8161a5ca3df4290bbd0ab896f1ef

See merge request: !745"
Revert commit 396cb2ec7e0b1a21bc0d7424c627f0d9321884bc
* fix missing ;


* fix stream load block


See merge request: !855
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

}

int StreamLoadAction::on_header(HttpRequest* req) {
req->mark_send_reply();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这么改完之后 所有的请求都走async了?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

只有 StreamLoad 请求,另外只是提交 reply 是 async 的,客户端感知仍然是同步的。

}

private:
SendReplyType _send_reply_type = REPLY_SYNC;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个初始值是不是没啥用 因为 on_header 把它改写了

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

还有其他 HttpAction 会用到,他们需要默认是 SYNC 的。

gavinchou
gavinchou previously approved these changes Jul 14, 2025
@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jul 14, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@dataroaring
Copy link
Contributor

run buildall

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Jul 14, 2025
gavinchou
gavinchou previously approved these changes Jul 14, 2025
@Userwhite
Copy link
Contributor Author

run buildall

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jul 14, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

曹俊辉 added 4 commits July 15, 2025 10:55
…e better http performance

* fix some bug for partition

* fix for thrift

* fix the thrift exit bug

* Revert "[feat](http): use async reply to provide better http performance

* ensure free order

* [Fix](stream-load) Fix stream load stuck under high concurrency (apache#36772)

When the concurrency of streamload exceeds the number of threads in the
remote scanner, streamload may get stuck. The reason is that the
libevent thread blocks and waits for streamload to complete, and when
there is no intersection between the tasks handled by the scanner thread
and the libevent thread, it gets stuck.
The solution is to convert the synchronous waiting tasks of libevent
into asynchronous execution by using callbacks in the streamload
executor thread.

See merge request: !740"
Revert commit d9e74efa762c8161a5ca3df4290bbd0ab896f1ef

See merge request: !745"
Revert commit 396cb2ec7e0b1a21bc0d7424c627f0d9321884bc
* fix missing ;


* fix stream load block


See merge request: !855
@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 88.03% (125/142) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.17% (25414/34731)
Line Coverage 60.39% (270003/447065)
Region Coverage 55.97% (227693/406779)
Branch Coverage 57.07% (96879/169743)

@Userwhite
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 31950 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 6383de4b08396780021f95fa851a20f0428b15aa, data reload: false

------ Round 1 ----------------------------------
q1	17643	4236	4057	4057
q2	2030	361	240	240
q3	10172	1244	738	738
q4	10220	920	327	327
q5	7554	2115	1906	1906
q6	190	165	135	135
q7	893	811	661	661
q8	9289	1380	1078	1078
q9	5228	4777	4622	4622
q10	6728	1808	1433	1433
q11	496	322	284	284
q12	691	736	566	566
q13	17819	3804	3132	3132
q14	289	293	273	273
q15	590	550	502	502
q16	671	684	656	656
q17	710	742	620	620
q18	7017	6492	6395	6395
q19	1096	951	621	621
q20	398	368	248	248
q21	3132	2551	2478	2478
q22	1072	1014	978	978
Total cold run time: 103928 ms
Total hot run time: 31950 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4095	4078	4060	4060
q2	334	403	325	325
q3	2083	2563	2240	2240
q4	1304	1741	1281	1281
q5	4106	4020	4072	4020
q6	209	176	133	133
q7	1878	1801	1839	1801
q8	2671	2474	2449	2449
q9	7086	7079	7084	7079
q10	2551	2737	2315	2315
q11	543	475	463	463
q12	738	748	603	603
q13	3598	4148	3297	3297
q14	274	431	292	292
q15	578	499	512	499
q16	660	678	645	645
q17	1135	1263	1335	1263
q18	8090	7673	7851	7673
q19	917	928	897	897
q20	1986	2076	1909	1909
q21	4905	4664	4326	4326
q22	1135	1057	1035	1035
Total cold run time: 50876 ms
Total hot run time: 48605 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 173549 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 6383de4b08396780021f95fa851a20f0428b15aa, data reload: false

query5	4550	581	435	435
query6	327	239	203	203
query7	4218	451	265	265
query8	321	257	236	236
query9	8740	2656	2635	2635
query10	497	364	317	317
query11	15225	15125	14957	14957
query12	175	112	111	111
query13	1261	509	375	375
query14	5967	2921	2752	2752
query14_1	2692	2637	2642	2637
query15	195	196	175	175
query16	961	460	436	436
query17	1084	645	561	561
query18	2417	432	334	334
query19	221	244	195	195
query20	126	118	112	112
query21	213	138	123	123
query22	3824	3948	4042	3948
query23	16108	15744	15503	15503
query23_1	15540	15474	15477	15474
query24	7636	1582	1206	1206
query24_1	1215	1197	1221	1197
query25	538	434	410	410
query26	1241	263	153	153
query27	2782	450	295	295
query28	4573	2215	2189	2189
query29	827	539	421	421
query30	307	243	211	211
query31	786	623	552	552
query32	78	71	66	66
query33	521	334	272	272
query34	907	898	535	535
query35	750	777	695	695
query36	820	863	903	863
query37	131	96	74	74
query38	2720	2709	2695	2695
query39	775	748	735	735
query39_1	711	705	710	705
query40	217	130	165	130
query41	65	62	61	61
query42	107	104	103	103
query43	485	433	380	380
query44	1330	755	754	754
query45	185	186	174	174
query46	887	970	614	614
query47	1369	1435	1383	1383
query48	310	326	241	241
query49	621	429	328	328
query50	637	273	208	208
query51	3722	3782	3719	3719
query52	99	112	97	97
query53	298	325	267	267
query54	279	250	258	250
query55	80	78	76	76
query56	306	279	275	275
query57	1000	1030	937	937
query58	267	248	237	237
query59	2116	2176	2033	2033
query60	319	312	294	294
query61	160	150	159	150
query62	377	347	311	311
query63	297	262	273	262
query64	5019	1324	1090	1090
query65	3772	3656	3629	3629
query66	1475	447	324	324
query67	15958	14755	14713	14713
query68	2777	1048	778	778
query69	433	352	318	318
query70	979	954	943	943
query71	313	306	282	282
query72	6348	3609	3509	3509
query73	598	719	311	311
query74	8713	8814	8583	8583
query75	2803	2820	2488	2488
query76	2870	1052	678	678
query77	339	382	280	280
query78	9641	10041	9310	9310
query79	1108	876	628	628
query80	1175	576	500	500
query81	534	263	233	233
query82	1353	138	108	108
query83	326	250	234	234
query84	258	122	102	102
query85	945	498	478	478
query86	400	323	326	323
query87	2860	2896	2736	2736
query88	3290	2292	2319	2292
query89	391	349	336	336
query90	1941	154	140	140
query91	172	167	140	140
query92	68	71	62	62
query93	993	906	570	570
query94	587	330	294	294
query95	571	318	304	304
query96	588	470	221	221
query97	2327	2351	2317	2317
query98	219	199	201	199
query99	593	554	489	489
Total cold run time: 249283 ms
Total hot run time: 173549 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 26.99 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 6383de4b08396780021f95fa851a20f0428b15aa, data reload: false

query1	0.05	0.04	0.04
query2	0.09	0.05	0.05
query3	0.26	0.09	0.09
query4	1.61	0.12	0.11
query5	0.26	0.26	0.26
query6	1.15	0.67	0.65
query7	0.03	0.02	0.03
query8	0.05	0.04	0.04
query9	0.57	0.50	0.50
query10	0.56	0.55	0.55
query11	0.15	0.11	0.12
query12	0.16	0.12	0.12
query13	0.63	0.60	0.60
query14	1.01	0.99	0.99
query15	0.81	0.78	0.79
query16	0.42	0.41	0.39
query17	1.01	1.05	1.04
query18	0.22	0.22	0.21
query19	1.86	1.82	1.73
query20	0.02	0.01	0.02
query21	15.43	0.29	0.14
query22	4.73	0.05	0.05
query23	16.01	0.28	0.10
query24	1.33	0.23	0.70
query25	0.08	0.06	0.08
query26	0.14	0.13	0.12
query27	0.07	0.06	0.04
query28	4.42	1.06	0.88
query29	12.56	3.94	3.16
query30	0.26	0.14	0.11
query31	2.82	0.63	0.39
query32	3.23	0.55	0.47
query33	3.05	3.05	3.06
query34	16.57	5.19	4.58
query35	4.47	4.47	4.52
query36	0.67	0.51	0.49
query37	0.11	0.07	0.06
query38	0.07	0.05	0.04
query39	0.05	0.03	0.03
query40	0.17	0.15	0.14
query41	0.10	0.03	0.03
query42	0.05	0.03	0.02
query43	0.05	0.04	0.04
Total cold run time: 97.36 s
Total hot run time: 26.99 s

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 11.11% (14/126) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.39% (18966/35526)
Line Coverage 39.26% (175993/448245)
Region Coverage 33.80% (136073/402617)
Branch Coverage 34.75% (58796/169183)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 87.32% (124/142) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 72.22% (25081/34731)
Line Coverage 58.99% (263732/447065)
Region Coverage 53.91% (219278/406779)
Branch Coverage 55.45% (94117/169743)

@Userwhite
Copy link
Contributor Author

run buildall

@Userwhite
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32001 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit e3fd0ba4fd56f8e4109b3598db618f34d74a32a1, data reload: false

------ Round 1 ----------------------------------
q1	17755	4173	4099	4099
q2	2180	352	238	238
q3	10214	1309	767	767
q4	10230	867	333	333
q5	7510	2135	1851	1851
q6	199	176	143	143
q7	957	804	671	671
q8	9282	1429	1187	1187
q9	5335	4776	4644	4644
q10	7254	1805	1422	1422
q11	558	314	293	293
q12	688	740	613	613
q13	17825	3843	3133	3133
q14	291	297	271	271
q15	594	518	511	511
q16	677	690	617	617
q17	756	734	624	624
q18	6812	6335	6413	6335
q19	1107	975	604	604
q20	399	377	256	256
q21	3169	2537	2403	2403
q22	1069	1023	986	986
Total cold run time: 104861 ms
Total hot run time: 32001 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4089	4033	4060	4033
q2	348	373	322	322
q3	2118	2619	2263	2263
q4	1292	1722	1329	1329
q5	4136	4033	4070	4033
q6	207	170	132	132
q7	1928	2163	1863	1863
q8	2641	2403	2378	2378
q9	7204	7090	7096	7090
q10	2477	2733	2381	2381
q11	568	485	441	441
q12	728	802	623	623
q13	3667	3979	3544	3544
q14	294	315	271	271
q15	550	518	501	501
q16	678	697	623	623
q17	1151	1311	1409	1311
q18	8375	7757	7848	7757
q19	906	890	915	890
q20	2008	2101	1920	1920
q21	4936	4639	4273	4273
q22	1118	1029	964	964
Total cold run time: 51419 ms
Total hot run time: 48942 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 173277 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit e3fd0ba4fd56f8e4109b3598db618f34d74a32a1, data reload: false

query5	5029	589	446	446
query6	332	227	206	206
query7	4221	462	275	275
query8	336	250	235	235
query9	8732	2612	2609	2609
query10	518	385	327	327
query11	15250	15042	14852	14852
query12	179	112	117	112
query13	1286	510	383	383
query14	6105	2968	2728	2728
query14_1	2656	2622	2681	2622
query15	199	196	176	176
query16	3196	478	440	440
query17	1077	674	561	561
query18	2186	449	334	334
query19	213	218	185	185
query20	128	116	114	114
query21	533	138	137	137
query22	3956	4119	3841	3841
query23	15953	15611	15354	15354
query23_1	15370	15347	15580	15347
query24	7436	1589	1207	1207
query24_1	1243	1216	1253	1216
query25	577	445	392	392
query26	1221	260	156	156
query27	2744	456	294	294
query28	4511	2190	2197	2190
query29	756	527	428	428
query30	343	232	216	216
query31	817	638	557	557
query32	76	75	64	64
query33	554	335	285	285
query34	898	890	540	540
query35	741	796	713	713
query36	871	914	843	843
query37	151	96	82	82
query38	2745	2790	2673	2673
query39	786	757	721	721
query39_1	728	718	717	717
query40	218	131	120	120
query41	69	66	62	62
query42	111	100	107	100
query43	462	441	457	441
query44	1334	751	751	751
query45	192	184	173	173
query46	859	972	634	634
query47	1411	1472	1314	1314
query48	319	336	250	250
query49	595	443	348	348
query50	664	289	217	217
query51	3785	3759	3814	3759
query52	109	109	100	100
query53	302	333	277	277
query54	310	280	262	262
query55	84	77	75	75
query56	302	305	310	305
query57	1041	1001	968	968
query58	291	264	252	252
query59	2179	2223	2196	2196
query60	346	330	313	313
query61	197	192	184	184
query62	411	374	324	324
query63	303	274	287	274
query64	4671	1411	1131	1131
query65	3823	3759	3693	3693
query66	1412	446	409	409
query67	15091	14741	14545	14545
query68	8277	1023	733	733
query69	499	343	315	315
query70	1007	942	927	927
query71	374	299	285	285
query72	6011	3404	3418	3404
query73	780	736	312	312
query74	8754	8811	8597	8597
query75	2886	2846	2483	2483
query76	3442	1061	677	677
query77	556	383	288	288
query78	9727	9935	9218	9218
query79	1246	921	605	605
query80	665	601	479	479
query81	518	266	232	232
query82	212	146	107	107
query83	266	267	254	254
query84	266	121	109	109
query85	907	508	470	470
query86	390	322	318	318
query87	2885	2860	2765	2765
query88	3272	2311	2316	2311
query89	388	350	335	335
query90	2084	164	155	155
query91	170	171	140	140
query92	73	68	63	63
query93	1122	912	563	563
query94	572	321	306	306
query95	589	339	362	339
query96	604	492	213	213
query97	2351	2389	2283	2283
query98	220	197	200	197
query99	620	576	494	494
Total cold run time: 256607 ms
Total hot run time: 173277 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 27.01 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit e3fd0ba4fd56f8e4109b3598db618f34d74a32a1, data reload: false

query1	0.06	0.05	0.05
query2	0.17	0.05	0.05
query3	0.28	0.09	0.09
query4	1.61	0.11	0.12
query5	0.26	0.26	0.27
query6	1.16	0.67	0.66
query7	0.03	0.03	0.03
query8	0.06	0.04	0.04
query9	0.59	0.50	0.50
query10	0.55	0.54	0.55
query11	0.19	0.11	0.12
query12	0.16	0.13	0.13
query13	0.62	0.60	0.60
query14	1.01	1.00	0.97
query15	0.82	0.81	0.80
query16	0.39	0.37	0.40
query17	0.98	1.00	1.06
query18	0.23	0.22	0.22
query19	1.88	1.71	1.81
query20	0.02	0.02	0.01
query21	15.45	0.30	0.13
query22	4.92	0.05	0.06
query23	16.15	0.29	0.10
query24	1.96	0.53	0.57
query25	0.09	0.07	0.05
query26	0.13	0.13	0.14
query27	0.05	0.04	0.05
query28	4.71	1.05	0.88
query29	12.56	3.92	3.13
query30	0.28	0.15	0.12
query31	2.82	0.61	0.38
query32	3.23	0.56	0.47
query33	2.96	3.02	3.03
query34	16.78	5.10	4.45
query35	4.45	4.44	4.49
query36	0.66	0.51	0.49
query37	0.13	0.07	0.06
query38	0.08	0.04	0.04
query39	0.06	0.03	0.03
query40	0.17	0.15	0.14
query41	0.11	0.04	0.03
query42	0.06	0.02	0.02
query43	0.04	0.03	0.04
Total cold run time: 98.92 s
Total hot run time: 27.01 s

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 11.11% (14/126) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.39% (18967/35526)
Line Coverage 39.27% (176015/448245)
Region Coverage 33.81% (136114/402617)
Branch Coverage 34.76% (58806/169183)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 88.81% (119/134) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 72.20% (25077/34731)
Line Coverage 58.94% (263507/447052)
Region Coverage 53.77% (218743/406776)
Branch Coverage 55.37% (93979/169741)

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: Stream Load Async Return Optimization

Thanks for this PR! The optimization concept is sound - making HTTP replies asynchronous should improve thread utilization under high concurrency. I've traced through the synchronization logic and verified it handles race conditions correctly.

Critical Issues (Please Fix Before Merge)

1. Typo in Variable Name (http_request.h:108)

std::future<bool> _http_reply_futrue = _http_reply_promise.get_future();
//                         ^^^^^^ should be "future"

2. Hardcoded 3-Second Timeout (http_request.cpp:175)

auto status = _http_reply_futrue.wait_for(std::chrono::seconds(3));

3 seconds is too aggressive for legitimate long-running loads. Consider:

  • Making it configurable via config::stream_load_async_reply_timeout_s
  • Default to something like 300 seconds

3. Debug Stack Trace Logging (http_request.cpp:159)

VLOG_NOTICE << "finish send reply, infos=" << infos
            << ", stack=" << get_stack_trace(); // temp locate problem
  • get_stack_trace() is expensive even when VLOG is disabled (string is still constructed)
  • Comment says "temp locate problem" - should be removed before merge

Minor Issues

  1. Duplicate error logging - Same error logged in both handle() (line 122) and _send_reply() (line 157)

  2. Missing finish_send_reply() in send_error() - http_channel.cpp calls finish_send_reply() in send_reply() and send_file() but not in send_error(). Is this intentional?

  3. Suggestion for is_prepare_success - Using shared_ptr<bool> works but std::atomic<bool> would make the intent clearer

  4. Consider adding comments - The synchronization state machine with _can_send_reply, _finish_send_reply, and the CV is correct but complex. Comments explaining the wait conditions would help future maintainers.


Thread Safety Analysis ✅

I traced through these scenarios and verified the logic is correct:

Scenario Result
Normal success path ✅ Reply sent via callback
on_header failure ✅ Reply sent immediately, handle() skipped
Fragment prepare failure ✅ Callback not invoked, handle() sends reply
Request timeout _finish_send_reply=true prevents use-after-free

Overall

Concept: ✅ Sound optimization approach
Implementation: ⚠️ Needs polish (typo, timeout, debug logging)
Thread Safety: ✅ Correct

Recommend addressing the 3 critical issues before merge.

@dataroaring
Copy link
Contributor

Follow-up on send_error() missing finish_send_reply():

After deeper investigation, this is NOT a bug. Here's why:

  1. StreamLoadAction inherits from HttpHandler (not HttpHandlerWithAuth)
  2. StreamLoadAction never calls send_error() - it uses its own _send_reply() method for all error paths
  3. mark_send_reply() is only called in StreamLoadAction::on_header(), marking it as REPLY_ASYNC
  4. Other handlers (like HttpHandlerWithAuth) that use send_error() never call mark_send_reply(), so they stay in default REPLY_SYNC mode
  5. In REPLY_SYNC mode, wait_finish_send_reply() returns immediately without waiting

Therefore, send_error() doesn't need finish_send_reply() because it's never called from an async reply context. The code is correct as-is.

@Userwhite
Copy link
Contributor Author

Userwhite commented Jan 10, 2026

is_prepare_success must use std::shared_ptr
s_prepare_success is freed after _exec_env->fragment_mgr()->exec_plan_fragment() is called,
but the exec_fragment callback still needs to access it.

@Userwhite
Copy link
Contributor Author

run buildall

@Userwhite
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 31474 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit ebb3f8e2a46fa984729197459954c85a0b3acfb9, data reload: false

------ Round 1 ----------------------------------
q1	17612	4184	4039	4039
q2	2013	364	243	243
q3	10158	1337	722	722
q4	10243	884	329	329
q5	8140	2107	1853	1853
q6	247	177	138	138
q7	926	803	660	660
q8	9264	1369	1167	1167
q9	5124	4689	4656	4656
q10	6812	1829	1405	1405
q11	498	293	282	282
q12	759	737	572	572
q13	17761	3845	3104	3104
q14	291	299	273	273
q15	600	523	495	495
q16	673	685	625	625
q17	684	803	536	536
q18	7007	6285	6409	6285
q19	1218	979	620	620
q20	389	368	247	247
q21	3030	2505	2268	2268
q22	1037	1019	955	955
Total cold run time: 104486 ms
Total hot run time: 31474 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4114	4050	4041	4041
q2	332	388	310	310
q3	2098	2576	2199	2199
q4	1351	1794	1314	1314
q5	4116	4065	4152	4065
q6	203	175	134	134
q7	1901	1820	2085	1820
q8	2527	2512	2451	2451
q9	7216	7135	7159	7135
q10	2524	2742	2306	2306
q11	598	499	490	490
q12	738	748	602	602
q13	3726	4017	3364	3364
q14	320	334	274	274
q15	612	598	544	544
q16	649	676	622	622
q17	1155	1330	1402	1330
q18	7918	7903	8000	7903
q19	848	879	827	827
q20	2017	2043	2032	2032
q21	5055	4478	4330	4330
q22	1178	1047	982	982
Total cold run time: 51196 ms
Total hot run time: 49075 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 173067 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit ebb3f8e2a46fa984729197459954c85a0b3acfb9, data reload: false

query5	4414	609	428	428
query6	321	246	235	235
query7	4217	461	255	255
query8	327	244	224	224
query9	8725	2651	2700	2651
query10	485	370	308	308
query11	15289	15068	14748	14748
query12	168	124	125	124
query13	1247	492	367	367
query14	6112	2977	2722	2722
query14_1	2634	2614	2641	2614
query15	204	192	170	170
query16	968	490	460	460
query17	1094	675	596	596
query18	2474	425	325	325
query19	227	220	197	197
query20	123	121	116	116
query21	214	138	118	118
query22	4067	4076	3982	3982
query23	15993	15546	15633	15546
query23_1	15384	15456	15369	15369
query24	7324	1564	1176	1176
query24_1	1189	1171	1180	1171
query25	525	478	396	396
query26	1245	263	154	154
query27	2764	454	283	283
query28	4599	2145	2137	2137
query29	761	552	440	440
query30	305	239	212	212
query31	816	623	548	548
query32	72	69	68	68
query33	529	335	281	281
query34	891	884	517	517
query35	735	784	681	681
query36	824	874	843	843
query37	124	94	83	83
query38	2779	2739	2599	2599
query39	782	748	733	733
query39_1	718	727	727	727
query40	219	133	116	116
query41	69	62	63	62
query42	107	105	106	105
query43	482	494	455	455
query44	1364	725	708	708
query45	189	182	174	174
query46	861	1000	598	598
query47	1423	1453	1408	1408
query48	357	332	252	252
query49	632	436	343	343
query50	635	272	215	215
query51	3817	3766	3745	3745
query52	108	110	96	96
query53	315	327	293	293
query54	309	281	269	269
query55	87	82	75	75
query56	309	306	313	306
query57	999	1027	957	957
query58	276	262	262	262
query59	2158	2153	2056	2056
query60	332	326	308	308
query61	197	189	187	187
query62	403	370	311	311
query63	300	276	277	276
query64	5029	1421	1121	1121
query65	3796	3724	3742	3724
query66	1447	437	317	317
query67	15239	14766	15880	14766
query68	4773	1028	717	717
query69	511	367	324	324
query70	1049	969	837	837
query71	370	310	282	282
query72	6098	3420	3437	3420
query73	742	727	301	301
query74	8873	8712	8657	8657
query75	2835	2824	2452	2452
query76	3455	1068	648	648
query77	519	383	279	279
query78	9794	9961	9200	9200
query79	951	903	595	595
query80	1127	584	486	486
query81	541	263	233	233
query82	409	157	111	111
query83	371	263	243	243
query84	260	109	99	99
query85	925	514	453	453
query86	396	328	326	326
query87	2905	2895	2781	2781
query88	3245	2233	2220	2220
query89	387	360	336	336
query90	1919	161	153	153
query91	177	169	140	140
query92	67	68	62	62
query93	1049	882	531	531
query94	637	319	297	297
query95	578	377	311	311
query96	584	478	209	209
query97	2362	2402	2314	2314
query98	228	212	200	200
query99	566	578	509	509
Total cold run time: 251146 ms
Total hot run time: 173067 ms

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 11.29% (14/124) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.02% (18980/35797)
Line Coverage 39.09% (175899/449950)
Region Coverage 33.68% (136263/404617)
Branch Coverage 34.70% (58879/169701)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 88.64% (117/132) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.87% (25849/34992)
Line Coverage 61.27% (274942/448716)
Region Coverage 56.20% (229696/408729)
Branch Coverage 58.03% (98793/170246)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Enhancement] use stream load async return to optimize the performance under high concurrency

5 participants